55 research outputs found
A Web Service for Video Summarization
This paper presents a Web service that supports the automatic generation of video summaries for user-submitted videos. The developed Web application decomposes the video into segments, evaluates the fitness of each segment to be included in the video summary and selects appropriate segments until a pre-defined time budget is filled. The integrated deep-learning-based video analysis and summarization technologies exhibit state-of-the-art performance and, by exploiting the processing capabilities of modern GPUs, offer faster than real-time processing. Configurations for generating video summaries that fulfill the specifications for posting on the most common video sharing platforms and social networks are available in the user interface of this application, enabling the one-click generation of distribution-channel-specific summaries
Deep Domain-Adversarial Image Generation for Domain Generalisation
Machine learning models typically suffer from the domain shift problem when
trained on a source dataset and evaluated on a target dataset of different
distribution. To overcome this problem, domain generalisation (DG) methods aim
to leverage data from multiple source domains so that a trained model can
generalise to unseen domains. In this paper, we propose a novel DG approach
based on \emph{Deep Domain-Adversarial Image Generation} (DDAIG). Specifically,
DDAIG consists of three components, namely a label classifier, a domain
classifier and a domain transformation network (DoTNet). The goal for DoTNet is
to map the source training data to unseen domains. This is achieved by having a
learning objective formulated to ensure that the generated data can be
correctly classified by the label classifier while fooling the domain
classifier. By augmenting the source training data with the generated unseen
domain data, we can make the label classifier more robust to unknown domain
changes. Extensive experiments on four DG datasets demonstrate the
effectiveness of our approach.Comment: 8 page
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch
This paper focuses on long-tailed object detection in the semi-supervised
learning setting, which poses realistic challenges, but has rarely been studied
in the literature. We propose a novel pseudo-labeling-based detector called
CascadeMatch. Our detector features a cascade network architecture, which has
multi-stage detection heads with progressive confidence thresholds. To avoid
manually tuning the thresholds, we design a new adaptive pseudo-label mining
mechanism to automatically identify suitable values from data. To mitigate
confirmation bias, where a model is negatively reinforced by incorrect
pseudo-labels produced by itself, each detection head is trained by the
ensemble pseudo-labels of all detection heads. Experiments on two long-tailed
datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses
existing state-of-the-art semi-supervised approaches -- across a wide range of
detection architectures -- in handling long-tailed object detection. For
instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when
using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using
Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can
even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202
- …